Realistic Synthetic Data for Testing Association Rule Mining Algorithms for Market Basket Databases

نویسندگان

  • Colin Cooper
  • Michele Zito
چکیده

Association rule mining (ARM) is an important subtask in Knowledge Discovery in Databases. Existing ARM algorithms have largely been tested using artificial data generated by the QUEST program developed by Agrawal et al. [2]. Concerns have been raised before [7, 25] on the significance of such sample data. We provide the first theoretical investigation of the statistical properties of the databases generated by the QUEST program. Motivated by the claim (supported by empirical evidence) that item occurrences in real life market basket databases follow a rather different pattern, we then propose an alternative model for generating artificial data. We claim that such a model is simpler than QUEST and generates structures that are closer to real-life market basket data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Optimality of Association-rule Mining Algorithms

Since its introduction close to a decade ago, the problem of efficient mining of association rules on market-basket data has attracted tremendous attention. Numerous algorithms have been proposed, each one in turn claiming to outperform its predecessors on a representative set of databases. In this paper, we first focus our attention on the question of how much space remains for performance imp...

متن کامل

RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases

Traditionally, research in the area of frequent itemset mining has focused on mining market basket data. Several algorithms and techniques have been introduced in the literature for mining data represented in basket data format. The primary objective of these algorithms has been to improve the performance of the mining process. Unlike basket data representation, no algorithms exist for mining f...

متن کامل

On-Line Analytical Mining of Association Rules

With wide applications of computers and automated data collection tools, massive amounts of data have been continuously collected and stored in databases, which creates an imminent need and great opportunities for mining interesting knowledge from data. Association rule mining is one kind of data mining techniques which discovers strong association or correlation relationships among data. The d...

متن کامل

A Pragmatic Approach on Association Rule Mining and its Effective Utilization in Large Databases

This paper deals with the effective utilization of association rule mining algorithms in large databases used for especially business organizations where the amount of transactions and items plays a crucial role for decision making. Frequent item-set generation and the creation of strong association rules from the frequent item-set patterns are the two basic steps in association rule mining. We...

متن کامل

Numeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm

Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007